{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f933fcd8",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a440130e",
   "metadata": {},
   "source": [
    "# Detecting audio issues in a condition monitoring dataset (audio)\n",
    "This notebook aims at detecting issues in a **condition monitoring** dataset using **audio data**. As a basis it uses the DCASE challenge dataset where the goal is to detect if a machine is in a defect state or not.\n",
    "\n",
    "In order to run the example install the **dependencies** as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a7de131",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed83733e",
   "metadata": {},
   "source": [
    "## Step 1: Download the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d13a504",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import datasets to download the data\n",
    "import datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c05bfab2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the dataset and convert to pandas dataframe\n",
    "dataset = datasets.load_dataset(\n",
    "        \"renumics/dcase23-task2-enriched\", \"dev\", split=\"all\", streaming=False\n",
    "    )\n",
    "df = dataset.to_pandas()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b078877",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sample the dataset randomly to make the example run faster\n",
    "df = df.sample(1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b49cf9f",
   "metadata": {},
   "source": [
    "# Step 2: Detect problematic data slices based on audio data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef4da180",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The imports\n",
    "from sklearn.metrics import accuracy_score\n",
    "from renumics.spotlight import Audio\n",
    "from sliceguard import SliceGuard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90b5d870",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run slice detection based on general purpose audio embeddings (pretrained model trained on Audioset)\n",
    "sg = SliceGuard()\n",
    "issues = sg.find_issues(\n",
    "    df,\n",
    "    [\"path\"],\n",
    "    \"label\",\n",
    "    \"dev_train_lof_anomaly\",\n",
    "    accuracy_score,\n",
    "    metric_mode=\"max\",\n",
    "    min_support=5,\n",
    "    min_drop=0.2\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a002e7c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Report the results in Renumics Spotlight\n",
    "sg.report(spotlight_dtype={\"path\": Audio})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eacd6c92",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
